🐛 improve early request collection reliability in Firefox#4573
Conversation
a3581e4 to
8558f01
Compare
When a fetch resolves, two things happen around the same time: - the browser dispatches a PerformanceResourceTiming entry to our observer - afterSend() resumes from `await responsePromise` and notifies REQUEST_COMPLETED On Firefox the observer task can fire before the resolve microtask runs, so the request isn't in the registry yet when assembleResource() looks it up, and the resource event ends up without handlingStack / requestInit / etc. Defer the registry lookup for request-type entries by REQUEST_MATCHING_DELAY (50ms) so the matching REQUEST_COMPLETED has time to land. assembleResource now takes the resolved request directly instead of pulling it from the registry. Tests drive the delay through mockClock.tick().
8558f01 to
b645dab
Compare
Bundles Sizes Evolution
🚀 CPU Performance
🧠 Memory Performance
|
🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: 7636e78 | Docs | Datadog PR Page | Give us feedback! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b645dabc27
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| setTimeout( | ||
| () => handleResource(() => assembleResource(entry, requestRegistry.getMatchingRequest(entry), configuration)), | ||
| REQUEST_MATCHING_DELAY |
There was a problem hiding this comment.
Cancel delayed matching timers on stop
For request-type entries this timer can still fire after startResourceCollection().stop() has been called; taskQueue.stop() only clears work already queued, so the delayed callback will call handleResource() later and push a new task that can emit a stale resource event after the collection was stopped, especially during SDK teardown/re-init with in-flight fetch/XHR entries. Store and clear these timeout ids, or guard the callback with a stopped flag.
Useful? React with 👍 / 👎.
mormubis
left a comment
There was a problem hiding this comment.
Looks good! One thing: we should clear the pending setTimeout timers in stop() so a delayed callback doesn't push a stale task after teardown. The batch stops before resourceCollection.stop(), so the assembly pipeline is still listening when the timer fires. What do you think?
|
/merge |
|
View all feedbacks in Devflow UI.
|
|
/merge -c |
|
View all feedbacks in Devflow UI.
This merge request is not in the queue and can't be unqueued To get help about command usage, write If you need support, contact us on Slack #devflow with those details! |
Motivation
The microfrontend e2e test
microfrontend > RUM > with beforeSend > expose handling stack for fetch requests › asyncfails ~70% of the time onfirefox-pinned(10 failed / 6 flaky / 4 passed out of 20 reproductions). The same race affects any fetch resource event in production:event.context.handlingStack(and the rest of the request-derived domain context —requestInit,response,error, …) is intermittentlyundefinedfor fetch resources on Firefox.Root cause
When a fetch resolves, two things happen around the same time:
PerformanceResourceTimingentry to our observerafterSend()resumes fromawait responsePromiseand notifiesREQUEST_COMPLETEDOn Firefox the observer task can fire before the resolve microtask runs, so the request isn't in the
requestRegistryyet whenassembleResource()looks it up viarequestRegistry.getMatchingRequest(). The lookup returnsundefinedand the resulting resource event is missing all request-derived domain context.This was introduced by #4285
always collect early requests(commit2b71062e7), which switched from request-driven (performance.getEntriesByName()atREQUEST_COMPLETEDtime) to entry-driven matching.CI logs of the failing case make the ordering obvious:
Changes
startResourceCollection: for request-type entries (fetch/xmlhttprequestinitiator), defer the registry lookup byREQUEST_MATCHING_DELAY(50 ms) viasetTimeout. In practice the matchingREQUEST_COMPLETEDarrives within ~30 ms even on slow CI runners, so 50 ms is a generous bound.assembleResourcenow takes the resolvedrequestdirectly instead of pulling it from the registry — the matching policy is owned by the call site where the timing context is clear.requestRegistryis unchanged;getMatchingRequestis still synchronous.mockClock.tick(REQUEST_MATCHING_DELAY)insiderunTasks.Test instructions
Repro on the e2e suite (Firefox is the most reliable, but the race is browser-agnostic):
CI validation (e2e job scoped to the failing test,
--repeat-each=20):Checklist